Skip to content

feat(chart): opt-in Argo Rollouts (blue-green) + pre-deploy migration Job#3653

Open
nicacioliveira wants to merge 3 commits into
mainfrom
feat/chart-argo-rollouts-bluegreen
Open

feat(chart): opt-in Argo Rollouts (blue-green) + pre-deploy migration Job#3653
nicacioliveira wants to merge 3 commits into
mainfrom
feat/chart-argo-rollouts-bluegreen

Conversation

@nicacioliveira

@nicacioliveira nicacioliveira commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Two opt-in flags in the studio Helm chart so consumers can switch from the default Deployment to an Argo Rollouts Rollout (blue-green or canary), and move DB migrations out of pod startup into a dedicated pre-sync Job.

Both default to off — existing installs keep the exact same Deployment with the on-startup migration path. No selector / label changes, no PVC churn.

What's new

  • `argoRollouts.enabled` — render a `Rollout` (argoproj.io/v1alpha1) instead of `Deployment`. Pod template is shared via the new `chart-deco-studio.podTemplate` helper so the two workload kinds describe an identical pod surface. Supports `blueGreen` (default) and `canary` strategies.

  • `migrationJob.enabled` — render a Job that runs `bun run --cwd=apps/mesh migrate` ONCE before pods start. Carries BOTH `helm.sh/hook: pre-install,pre-upgrade` and `argocd.argoproj.io/hook: PreSync` annotations so it sequences correctly whether installed via `helm upgrade` directly or synced by ArgoCD. The runtime pod command gets `--skip-migrations` appended (the studio CLI already exposes this flag — see `apps/mesh/src/cli.ts`), eliminating the race between N replicas migrating concurrently and giving a clear pre-deploy gate: migration Job fails → release aborted.

Files

File Change
`templates/_pod-template.tpl` new — shared pod template helper + `podCommand` helper that appends `--skip-migrations` when migrationJob is on
`templates/deployment.yaml` wraps in `{{- if not argoRollouts.enabled }}` and references the helper; lifts `spec.template` body out
`templates/rollout.yaml` new — gated on `argoRollouts.enabled`, blueGreen by default; mutual-exclusion `fail` for both-on configs
`templates/migration-job.yaml` new — gated on `migrationJob.enabled`. Sync-wave -1, dual helm + ArgoCD hooks, bounded backoffLimit/activeDeadlineSeconds/TTL
`templates/service-preview.yaml` new — preview Service for blue-green; Argo Rollouts manages its selector
`values.yaml` adds `argoRollouts` and `migrationJob` blocks; both off by default

Why opt-in

The chart is open-source and not every consumer has the argo-rollouts controller installed. Defaulting to Deployment keeps zero requirements on the consumer's cluster. Internal CD (deco-apps-cd) flips both flags on for the deco-studio / deco-studio-stg releases — that's a separate change.

Important: migration discipline note

Blue-green amplifies the schema/code overlap window. The Job moves migrations to a single execution point BEFORE the new ReplicaSet probes, but it does NOT make destructive DDL safe — the old (blue) ReplicaSet still serves traffic during the overlap window with the migrated schema. Destructive changes (DROP/RENAME/type changes) still require expand-contract discipline at the migration code level. This is independent of the chart and is being handled team-side as a code/review practice.

Test plan

  • `helm lint deploy/helm/studio` passes
  • Default render (no flags) produces the same workload surface as before:
    • Deployment with command `bun run deco --no-local-mode`
    • No Rollout, no preview Service, no migration Job
  • `helm template ... --set argoRollouts.enabled=true --set migrationJob.enabled=true` produces:
    • Rollout with `blueGreen` strategy, `activeService: deco-studio`, `previewService: deco-studio-preview`, `scaleDownDelaySeconds: 30`, `autoPromotionEnabled: false`
    • Preview Service
    • Migration Job with PreSync hooks and `bun run --cwd=apps/mesh migrate` command
    • Pod command has `--skip-migrations` appended
  • Canary mode renders (`--set argoRollouts.strategy.blueGreen.enabled=false --set argoRollouts.strategy.canary.enabled=true`)
  • Reviewer: install into a test cluster, verify the Rollout transitions correctly through preview → promotion

Summary by cubic

Adds opt-in Argo Rollouts (blue-green/canary) to the studio Helm chart via a shared pod template. Off by default; keeps the Deployment path unchanged and drops the previously proposed migration Job.

  • New Features

    • argoRollouts.enabled: renders a Rollout (argoproj.io/v1alpha1) instead of a Deployment; blue-green by default, canary supported.
    • Preview Service for blue-green; Argo manages active/preview selectors.
    • Shared _pod-template.tpl so Deployment and Rollout use the same pod spec.
  • Bug Fixes

    • HPA targets the Rollout when rollouts are enabled; otherwise the Deployment.
    • Install fails if both or neither of blue-green/canary are enabled.
    • Honors terminationGracePeriodSeconds: 0 via hasKey check.

Written for commit b78e0e3. Summary will update on new commits.

Review in cubic

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="deploy/helm/studio/templates/deployment.yaml">

<violation number="1" location="deploy/helm/studio/templates/deployment.yaml:1">
P1: Enabling Argo Rollouts removes the Deployment, but HPA still targets Deployment, leaving autoscaling with an invalid target.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread deploy/helm/studio/templates/migration-job.yaml Outdated
@@ -1,3 +1,4 @@
{{- if not (and .Values.argoRollouts .Values.argoRollouts.enabled) }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Enabling Argo Rollouts removes the Deployment, but HPA still targets Deployment, leaving autoscaling with an invalid target.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At deploy/helm/studio/templates/deployment.yaml, line 1:

<comment>Enabling Argo Rollouts removes the Deployment, but HPA still targets Deployment, leaving autoscaling with an invalid target.</comment>

<file context>
@@ -1,3 +1,4 @@
+{{- if not (and .Values.argoRollouts .Values.argoRollouts.enabled) }}
 apiVersion: apps/v1
 kind: Deployment
</file context>

Comment thread deploy/helm/studio/templates/rollout.yaml
Comment thread deploy/helm/studio/templates/_pod-template.tpl Outdated
Comment thread deploy/helm/studio/templates/_pod-template.tpl Outdated
nicacioliveira added a commit that referenced this pull request Jun 8, 2026
Five issues caught in PR review #3653, fixed without changing the opt-in
defaults or expanding scope outside the chart:

1. P1 — HPA targeted Deployment even when argoRollouts.enabled (so the
   Deployment didn't exist and autoscaling was silently broken). hpa.yaml
   now switches `scaleTargetRef` to `argoproj.io/v1alpha1 Rollout` when
   rollouts are on.

2. P1 — Migration Job (PreSync, sync-wave -1) ran before its envFrom
   ConfigMap/Secret on first install/sync. Dropped the helm
   pre-install/pre-upgrade hook (helm hooks run before non-hook resources,
   which caused the ordering bug on first install), kept only ArgoCD's
   PreSync + sync-wave -1. ConfigMap and Secret now get
   `argocd.argoproj.io/sync-wave: "-2"` annotations conditionally when
   migrationJob.enabled, so they materialize before the Job. For plain
   `helm install` (no ArgoCD), documented that `--wait` should be used so
   helm waits for both Job completion and Deployment readiness together.

3. P2 — Strategy selection silently fell back to blueGreen when both
   blueGreen.enabled and canary.enabled were false. Added an explicit
   `fail` in rollout.yaml requiring exactly one to be true (matching the
   existing both-on fail).

4. P2 — `terminationGracePeriodSeconds` used a truthiness check that
   silently dropped explicit `0` (immediate termination). Switched to
   `hasKey` so an explicit `0` is honored.

5. P2 — `--skip-migrations` was unconditionally appended to any custom
   `image.command` when migrationJob.enabled, which would break or be
   meaningless on non-deco entrypoints (e.g. debug containers running
   `sleep infinity`). Added `migrationJob.injectSkipMigrationsFlag`
   (default true) so the auto-append can be turned off when the user
   overrides `image.command`.

Verification:
- `helm lint deploy/helm/studio` clean
- Default render unchanged (no Rollout / no preview Service / no Job /
  no --skip-migrations / pod command identical)
- With `argoRollouts.enabled=true autoscaling.enabled=true`: HPA targets
  the Rollout (argoproj.io/v1alpha1) instead of the missing Deployment
- With `migrationJob.enabled=true`: ConfigMap + Secret both carry
  `sync-wave: "-2"`; Job carries PreSync + `sync-wave: "-1"`
- With `argoRollouts.enabled=true` + both strategies disabled: helm
  template fails with a clear message
- With `migrationJob.injectSkipMigrationsFlag=false`: `--skip-migrations`
  is NOT appended to the pod command

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
nicacioliveira and others added 3 commits June 8, 2026 10:37
… Job

Adds two opt-in flags to the studio Helm chart so consumers can switch from
the default Deployment to an Argo Rollouts Rollout with blue-green strategy,
and move DB migrations out of pod startup into a dedicated pre-sync Job.

Both default to off — existing installs keep the exact same Deployment with
the on-startup migration path. No selector / label changes, no PVC churn.

## What's new

- `argoRollouts.enabled` — when true, render a `Rollout` (argoproj.io/v1alpha1)
  instead of the `Deployment`. Pod template is shared via the new
  `chart-deco-studio.podTemplate` helper so the two workload kinds describe an
  identical pod surface. Supports `blueGreen` (default) and `canary` strategies.

- `migrationJob.enabled` — when true, render a Job that runs
  `bun run --cwd=apps/mesh migrate` ONCE before pods start. Carries BOTH
  `helm.sh/hook: pre-install,pre-upgrade` and
  `argocd.argoproj.io/hook: PreSync` annotations so it sequences correctly
  whether installed via `helm upgrade` directly or synced by ArgoCD. The
  runtime pod command gets `--skip-migrations` appended (the studio CLI
  already exposes this flag — see `apps/mesh/src/cli.ts`), eliminating the
  race between N replicas migrating concurrently and giving a clear
  pre-deploy gate: migration Job fails → release aborted.

## New / modified files

- `templates/_pod-template.tpl` (new) — shared pod template helper + the
  `podCommand` helper that appends `--skip-migrations` when migrationJob is on.
- `templates/deployment.yaml` — now wraps in `{{- if not argoRollouts.enabled }}`
  and references the helper; lifts the entire `spec.template` body out.
- `templates/rollout.yaml` (new) — gated on `argoRollouts.enabled`, mirrors
  the Deployment via the same helper, picks blueGreen or canary based on
  values. Mutual-exclusion `fail` for both-on configs.
- `templates/migration-job.yaml` (new) — gated on `migrationJob.enabled`.
  Sync-wave -1, dual hooks, bounded backoffLimit/activeDeadlineSeconds/TTL.
- `templates/service-preview.yaml` (new) — rendered only for blue-green;
  Argo Rollouts manages its selector to point at the preview ReplicaSet.
- `values.yaml` — adds `argoRollouts` and `migrationJob` blocks; both off
  by default, defaults preserve current behavior.

## Why opt-in

The chart is open-source and not everyone has the argo-rollouts controller
installed. Defaulting to Deployment keeps zero requirements on the consumer's
cluster. Internal CD (deco-apps-cd) flips both flags on for the deco-studio /
deco-studio-stg releases — that's a separate change.

## Migration discipline note

Blue-green amplifies the schema/code overlap window. The Job moves migrations
to a single execution point BEFORE the new ReplicaSet probes, but it does NOT
make destructive DDL safe — the old (blue) ReplicaSet still serves traffic
during the overlap window with the migrated schema. Destructive changes
(DROP/RENAME/type changes) still require expand-contract discipline at the
migration code level. This is independent of the chart and is being handled
team-side as a code/review practice.

## Verification

- `helm lint deploy/helm/studio` passes
- `helm template deco-studio deploy/helm/studio` (default) renders identical
  workload surface to before — Deployment with `bun run deco --no-local-mode`,
  no Rollout, no preview Service, no migration Job
- `helm template ... --set argoRollouts.enabled=true --set migrationJob.enabled=true`
  renders Rollout with blueGreen, preview Service, migration Job with the
  PreSync hooks, and `--skip-migrations` appended to the pod command

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Five issues caught in PR review #3653, fixed without changing the opt-in
defaults or expanding scope outside the chart:

1. P1 — HPA targeted Deployment even when argoRollouts.enabled (so the
   Deployment didn't exist and autoscaling was silently broken). hpa.yaml
   now switches `scaleTargetRef` to `argoproj.io/v1alpha1 Rollout` when
   rollouts are on.

2. P1 — Migration Job (PreSync, sync-wave -1) ran before its envFrom
   ConfigMap/Secret on first install/sync. Dropped the helm
   pre-install/pre-upgrade hook (helm hooks run before non-hook resources,
   which caused the ordering bug on first install), kept only ArgoCD's
   PreSync + sync-wave -1. ConfigMap and Secret now get
   `argocd.argoproj.io/sync-wave: "-2"` annotations conditionally when
   migrationJob.enabled, so they materialize before the Job. For plain
   `helm install` (no ArgoCD), documented that `--wait` should be used so
   helm waits for both Job completion and Deployment readiness together.

3. P2 — Strategy selection silently fell back to blueGreen when both
   blueGreen.enabled and canary.enabled were false. Added an explicit
   `fail` in rollout.yaml requiring exactly one to be true (matching the
   existing both-on fail).

4. P2 — `terminationGracePeriodSeconds` used a truthiness check that
   silently dropped explicit `0` (immediate termination). Switched to
   `hasKey` so an explicit `0` is honored.

5. P2 — `--skip-migrations` was unconditionally appended to any custom
   `image.command` when migrationJob.enabled, which would break or be
   meaningless on non-deco entrypoints (e.g. debug containers running
   `sleep infinity`). Added `migrationJob.injectSkipMigrationsFlag`
   (default true) so the auto-append can be turned off when the user
   overrides `image.command`.

Verification:
- `helm lint deploy/helm/studio` clean
- Default render unchanged (no Rollout / no preview Service / no Job /
  no --skip-migrations / pod command identical)
- With `argoRollouts.enabled=true autoscaling.enabled=true`: HPA targets
  the Rollout (argoproj.io/v1alpha1) instead of the missing Deployment
- With `migrationJob.enabled=true`: ConfigMap + Secret both carry
  `sync-wave: "-2"`; Job carries PreSync + `sync-wave: "-1"`
- With `argoRollouts.enabled=true` + both strategies disabled: helm
  template fails with a clear message
- With `migrationJob.injectSkipMigrationsFlag=false`: `--skip-migrations`
  is NOT appended to the pod command

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…louts only

This PR's stated scope was opt-in Argo Rollouts blue-green for the studio
chart. The migration Job was scope creep on my part — the team is handling
migration discipline (expand-contract) at the migration code level, not via
chart-side orchestration.

Removed:
- templates/migration-job.yaml (deleted)
- `--skip-migrations` auto-append helper in _pod-template.tpl
- `migrationJob` block in values.yaml
- sync-wave annotations on ConfigMap and Secret (reverted to origin/main)
- Cross-references to migrationJob in argoRollouts and Rollout comments

Kept (legitimate Argo Rollouts support):
- templates/_pod-template.tpl (shared pod spec helper for Deployment + Rollout)
- templates/rollout.yaml (blueGreen + canary, mutual-exclusion + neither-set fails)
- templates/service-preview.yaml (preview Service for blueGreen)
- templates/deployment.yaml (wrapped with `if not argoRollouts.enabled`)
- templates/hpa.yaml (targets Rollout when rollouts on; otherwise Deployment)
- templates/_pod-template.tpl: terminationGracePeriodSeconds uses hasKey so
  explicit `0` is honored
- `argoRollouts` block in values.yaml

This addresses cubic findings 1 (HPA target), 3 (strategy neither-set fail),
4 (terminationGracePeriodSeconds 0). Findings 2 (migration hook ordering) and
5 (--skip-migrations on custom commands) are no longer applicable since the
migration Job is gone entirely.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@nicacioliveira nicacioliveira force-pushed the feat/chart-argo-rollouts-bluegreen branch from 2b3f3a0 to b78e0e3 Compare June 8, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant